9 research outputs found
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
PMG: Multi-core metabolite identification
Distributed computing has been considered for decades as a promising way of speeding up software execution, resulting in a valuable collection of safe and efficient concurrent algorithms. With the pervasion of multi-core processors, parallelization has moved to the center of attention with new challenges, especially regarding scalability to tens or even hundreds of parallel cores. In this paper, we present a scalable multi-core tool for the metabolomics community. This tool addresses the problem of metabolite identification which is currently a bottleneck in metabolomics pipeline.Analytical BioScience
Novel techniques for automorphism group computation
Graph automorphism (GA) is a classical problem, in which the objective is to compute the automorphism group of an input graph.
In this work we propose four novel techniques to speed up algorithms that solve the GA problem by exploring a search tree. They increase the performance of the algorithm by allowing to reduce the depth of the search tree, and by effectively pruning it.
We formally prove that a GA algorithm that uses these techniques correctly computes the automorphism group of the input graph. We also describe how the techniques have been incorporated into the GA algorithm conauto, as conauto-2.03, with at most an additive polynomial increase in its asymptotic time complexity.
We have experimentally evaluated the impact of each of the above techniques with several graph families. We have observed that each of the techniques by itself significantly reduces the number of processed nodes of the search tree in some subset of graphs, which justifies the use of each of them. Then, when they are applied together, their effect is combined, leading to reductions in the number of processed nodes in most graphs. This is also reflected in a reduction of the running time, which is substantial in some graph families
Visual Network Analysis of Dynamic Metabolic Pathways
Abstract. We extend our previous work on the exploration of static metabolic
networks to evolving, and therefore dynamic, pathways. We apply our visualization software to data from a simulation of early metabolism. Thereby, we show
that our technique allows us to test and argue for or against different scenarios for
the evolution of metabolic pathways. This supports a profound and efficient analysis of the structure and properties of the generated metabolic networks and its
underlying components, while giving the user a vivid impression of the dynamics
of the system. The analysis process is inspired by Ben Shneiderman’s mantra of
information visualization. For the overview, user-defined diagrams give insight
into topological changes of the graph as well as changes in the attribute set associated with the participating enzymes, substances and reactions. This way, “interesting features” in time as well as in space can be recognized. A linked view
implementation enables the navigation into more detailed layers of perspective
for in-depth analysis of individual network configuration
Application of Conformal Prediction in QSAR
Part 4: First Conformal Prediction and Its Applications Workshop (COPA 2012)International audienceQSAR modeling is a method for predicting properties, e.g. the solubility or toxicity, of chemical compounds using statistical learning techniques. QSAR is in widespread use within the pharmaceutical industry to prioritize compounds for experimental testing or to alert for potential toxicity. However, predictions from a QSAR model are difficult to assess if their prediction intervals are unknown. In this paper we introduce conformal prediction into the QSAR field to address this issue. We apply support vector machine regression in combination with two nonconformity measures to five datasets of different sizes to demonstrate the usefulness of conformal prediction in QSAR modeling. One of the nonconformity measures provides prediction intervals with almost the same width as the size of the QSAR models’ prediction errors, showing that the prediction intervals obtained by conformal prediction are efficient and useful
Thermodynamic Properties Of Asphaltenes: A Predictive Approach Based On Computer Assisted Structure Elucidation And Atomistic Simulations
INTRODUCTION Crude oil is a complex mixture of hydrocarbons and heteroatomic organic compounds of varying molecular weight and polarity [1]. A common practice in the petroleum industry is to separate crude oil into four chemically distinct fractions: saturates, aromatics, asphaltenes and resins [1--4]. Asphaltenes are operationally defined as the non-volatile and polar fraction of petroleum that is insoluble in n-alkanes (i.e., pentane). Conversely, resins are defined as the non-volatile and polar fraction of crude oil that is soluble in n-alkanes (i.e., pentane) and aromatic solvents (i.e., toluene) and insoluble in ethyl acetate. A commonly accepted view in petroleum chemistry is that asphaltenes form micelles which are stabilized by adsorbed resins kept in solution by aromatics [5,6]. Two key parameters that control the stability of asphaltene micelles in a crude oil are the ratio of aromatics to saturates and that of resins to asphaltenes.
Virtual porous carbons: what they are and what they can be used for
We use the term “virtual porous carbon” (VPC) to describe computer-based molecular models of nanoporous carbons that go beyond the ubiquitous slit pore model and seek to engage with the geometric, topological and chemical heterogeneity that characterises almost every form of nanoporous carbon. A small number of these models have been developed and used since the early 1990s. These models and their use are reviewed. Included are three more detailed examples of the use of our VPC model. The first is concerned with the study of solid-like adsorbate in nanoporous carbons, the second with the absolute assessment of multi-isotherm based methods for determining the fractal dimension, and the final one is concerned with the fundamental study of diffusion in nanoporous carbons.M. J. Biggs and A. But